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THE USE OF REGRESSION DISCONTINUITY MODEL 
WITH CRITERION-REFERENCED TESTING 

IN THE EVALUATION- OF- 

COMPENSATORY EDUCATION 

R.F. Boruch 
J.S. DeGracie 

1, Introduction 

It would appear that within the next year and a half a relatively 
uniform procedure for the evaluation of Title I programs at the local 
educational agency (LEA) level will be mandated. There certainly can be 
no argument against the need for more comprehensive evaluation at the- LEA 
level. An overwhelming argument in support of this need is the findings 
of Talmadge in 197A. At that time he directed a search which encompassed 
some 2,000 projects conducted at LEAs, all of which had received some form 
of official recognition HEm^ success;'' ' 'Under -close' scruliiny 'only 'S'^V^^ 
2,000 could be found to meet the selection criteria of effectiveness, 
cost, availability and replicability . These findings plus the intuitive 
observations of the local education agencies are not lost on the LEAs. If 
a survey was to be co.nducted of the LEAs, I am sure that overwhelming 
support for the need for more comprehensive evaluation at the local level 
would be evidenced. 

This paper, then, is not intended as a testimonial. for the exclusion 
of such a mandate, but rather as an instructive guide to some of the problems 
and pitfalls which can be encountered at the local level when highly 
sophisticated statistical methods are used to investigate program effec-- 
tiveness at a LEA. Again, this is not to say that these problems and pit- 
falls cannot be overcome, but they should certainly be investigated to 
the fullest extent possible before a general mandate is made. 
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within the last year representatives from RMC Research Corporation 
and the Northwest Regional Laboratory have made presentations in Arizona . . 
concerning the forthcoming mandate, , These presentations drew heavily on 
the work by Talmadge published both in the monograph, "A Practical Guide 
to Measuring Project Impact on Student Achievement/* and '*State ESEA 
Title I Reports: Review and Analysis." In both of these presentations 
the models which they felt will be mandated were discussed. In general, 
there was direct agreement between the two presentations. The only disa- 
greement was in which model seemed to be the one which would be most used 
by the LEAs, The representatives from RMC seemed to concur with some of 
the previous writings of Talmadge that Model B, the Control Group Model, 
would be extremely difficult to implement since control groups for Title I 
students are not normally available, I would have to tend to agree with 
this feeling. Here, however find, myself' -at odds -w co^authbr' ' 

Dr. Boruch > who addressed this topic in discussing Talmadge *s work 
seemed to reject the idea of randomized experimental tests of compensatory 
programs out of hand, suggesting that their rarity is due by and large to 

^^^sibi 1 i t y_ level , At that time, and also doc umented_in_Campbell 

and Boruch (1975), he discussed and gave examples of a number of successful 
and unsuccessful randomized studies in education. He strongly reinforced 
the point concerning the utility and feasibility of randomized field ^.ests 
of compensatory programs and of program elements. The representatives 
from Northwest Regional Educational Laboratory seemed to agree with Dr , 
Boruch. They felt and strongly urged that Model B, the Control Group Mc;del, 
be the one used. Here, however, both authors find ourselves at odds with 
their reasons for selecting the Control Group. Model. It was stated in the 
presentation chat the Control Group Model could be used with equivalent or 
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nearly equivalent groups. It was further stated that if the groups were 
not equivalent enough analysis of covariance could be used to match the 
groups • In this case both authors must agree with Talmadge, who stresses 
the need for the treatment and comparison group to be sufficiently similar 
that they can be considered as random samples from a single population. 
The technique of analysis of variance is a technique for reducing the 
error variance in an experiment and it is not a technique which can be 
used to balance groups which are not similar. The theory behind the analy- 
sis of covariance is that the groups are randomly selected from the same 
population. Further discussions of this point can be found in Compensatory 
Education; A National Debate, Vol. 3, Disadvantaged Child , New York: 
Brunner/Mazel, 1970, where Campbell discusses regression artifacts in quasi- 
experimental evaluations. 

"""" ' ' The'^above^ci^'sc^^ the pro- 

posed evaluation models are difficult to select and even more. so to imple- 
ment. If the original proposer of the models, the consultant hired to 
assist the LEAs in selection and implementation of the model, and other . 
experts in the^ ^i^l^ ^^^-9^ agree, it can be seen what problems will face 
the local educational agencies when the mandate is implemented. 

The area that is specifically addressed in this paper is the selection 
and implementation of one of the evaluation models when the local educational 
agency is heavily committed to their own criterion-referenced objective- 
based program, A number of LEAis in recent -years' have made significant 
commitments to locally-developed objective-based criterion-referenced 
testing, which form the backbone for district generated classroom manage- 
ment systems • The Mesa Public School District is an example of such a LEA, 
For 5 years they have been in the process of moving from a total standardized 
testing program to a criterion-referenced testing program. These programs 



on objectives for the given programs developed by task forces of district 
personnel including classroom teachers, midlevel administrators and district 
curriculum people. Over the past 5 years, it is felt that this effort has 
yielded a viable classroom management system which leads to the accomplish- 
ment of the specific goals and objectives which are felt to be important, 
not only at the local school level, but all the way up to the superintendency 
and the board of education. The Mesa Public Schools, then, as are other 
local educational agencies that have similarly developed their own criterion- 
referenced testing programs, are faced with the problem of selecting one 
of the three proposed evaluation models and, once selected^ to usie this 
model for the evaluation of their programs. 

To get a jump on the Federal mandate, the Mesa Public Schools with 
the assistance of the Evaluation and Research program, and NIE Project on 
Secondary Analysis at Northwestern University, attempted to select an 
appropriate recommended evaluation model and to implement this model in 
its program evaluation. In its selection of a model the underlying assump- 
tion was that the district-generated criterion-referenced testing should 
furnish the necessary test data for the evaluation. This, then, left 

a choice between . Mod el-. B , the..,Contr 0.1 .Gr.oup^^^ 

Regression Model. It was felt that in the Mesa.^Public Schools it was 
simply not feasible to select students randomly for the program. And it 
was also felt that because of the uniqueness of the students being served, 
no control group was available which could be considered as a random 
sample from the same population as that of the treatment group. Therefore, 
through a process of elimination Model C or the Regression model utilizing 
the theory of regression discontinuity analysis was selected." In the next • 
pages, then, you will find the results of that analysis. In addition, 
some background concerning the exact program is also furnished. 

7 



/ 



As stated, this is a report of an evaluation of the Mesa Compensatory 
Rekding Program supported by Title I funds in the Mesa Public School 
District, Mesa, Arizona, The approach used here to estimate the program's 
effects on children's reading ability is the regression-discontinuity (RD) 
^design proposed in 1960 by Thistlethwaite and Campbell, Partly because 
this is a novel application of a promising but underutilized approach to 
program evaluation, we dedicate particular attention to both substantive 
estimates of program effect and to the credibility and usefulness of .the 
RD approach. 

In the following remarks, we first provide background information on 
the Mesa Compensatory Reading Program (Section 2) , and on the basic evalu- 
ation design (Section 3) • Section A summarizes data on reading test 
scores collected under the design. Succeeding sections cover the results 
of alternative competing analyses: conventional approaches based on linear 
models (5) and less conventional approaches based on nonlinear models 
(Section 6) • Section 7 is a summary, not so much of the findings of the 
analysis, but more toward the concerns that must be expressed as a result 
of the analysis. 
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2, Background: The Mesa Compensatory Reading Program 

The Mesa Public School District has operated a Reading Classroom 
Management System since 1970 to diagnose, prescribe, and monitor individual 
reading skills at all grade levels. The system has terminal goals, 
program goals, and behavioral objectives for each skill at each grade 
level. Diagnostic assessment tests are administered early and late 
within each grade levels; criterion-^ref erenced tests are used for formative 
evaluation, - 

Although reliability data has not been collected on the assessment 
instruments, the test items have undergone an iterative method of item 
analysis using the responses from over 12,000 students. The District's 
Office of Research maintains that the resulting tests have evolved over this 
period of time, 1970-1975, into valid and reliable instruments for 
measuring student achievement, 

Reading services are provided to 25 elementary schools by 20 reading 
resource teachers, 25 district reading paraprof essionals , and 35 Title I 
supported reading paraprof essionals . Of the 25 schools, 11 have been 
designated as Title 'I schools and receive additional services, i,e., in 
addition to receiving a district paraprof essional , the 35 Title I 
paraprof essionals are divided among the 11 schools. Ml paraprof essionals 
are trained with a 20--hour competencybased program. The major goal of 
the reading program is to alleviate reading problems by concentrating 
resources at the prinuiry grades with first grades receiving the top 
priority, Tliercfore, more students are provided individual attention 
at first grade than any other grade. Second grades receive more services 
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than third graders, etc., with each subsequent grade receiving less' 
individual services. 

Those students that are identified as being educationally deprived 
on the basis of the Mesa-developed criterion-referenced tests are given 
assistance by the paraprof essionals for approximately one-half hour a day 
for four days each week. This assistance is given in groups no larger 
than five students. The students are either removed from the classroom, 
or in some cases where the classroom is in an open space, the students are 
moved to a separate section of the classroom area. The total time. during the 
school year that the students receive assistance is approximately 28 weeks. 
The first two weeks at the beginning of the school year are taken up by 
training the paraprof essionals and the last few weeks at the end of the 
school year the para.professionals are released as this time is not usually 
exclusively instructional time and the paraprof essionals are specifically 
employed to directly impact the students. The identified students at the 
first-grade- level usually spend the entire year with the paraprof essionals . 
At the other grade levels the students are more apt to be placed back into 
-the classroom setting as soon as they accomplish their specific deficiencies 

that were identified through the use of the criterion-referenced tests. 
.On the average, approximately half of the students above grade one spend 
the entire year with the Title I program, with the other half spending 
approximately one-half of the school year with the program. ' 

2.1 Selection of Students for Special Assistance 

All students are pretested in September., with the criterion-referenced 
tests created by the school district's reading program staff, with 
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exception of first graders who are administered the Murpliy-Durrel I Rend.i 
Readiness test, . First graders take the district criterion-referenced test 
in January for further refinement of subskill needs. 

The criteria for selecting students for individual instruction in 
the compensatory reading program is based on test scores. Specifically, 
a student beginning first grade must score 71 or lower on the Murphy- 
Durrell Reading Readiness test. In grades 2 to 6, a student must score 
50% or lower on the school district's criterion-referenced tests to be 
eligible for special assistance from district and Title I resources. 

At tlie 11 Title I schools, students identified as needing extra 
assistance are assigned to a reading paraprofessional . Using the test 
results, the reading resource teacher prescribes appropriate activities 
which the paraprof essional implements. The reaJiag resource teacher . 
monitors this instruction Weekly and adjusts according to student proi^ress. 
The Title I student thereby receives additional services above and beyond 
the classroom and district resources. 

3. Evaluation Design 
The main substantive objectives of the evaluation is to determine 
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whether the Mesa Compensatory Reading Program exerts a notable effect on 
children's reading ability and to estimate the magnitude of the effect. . 
Reading ability here is defined operationally as scores achieved by 
students on the tests. The main methodological objectives are to better 
understand the benefits and limitations of the regression-discontinuity 

The basic RD design was developed for those cases in which some 
treatment (an award, a program) is offered to individuals on the basis 
of a meritocratic criteria and there is some need to estimate effects. 
This eligibility criteria must be a measurable continuum, such as economic 
need, educational need, an^ so forth. And, in the simplest application, 
individuals must be assigned strictly on the basis of this eligibility; 
e.g., those children scoring below a certain score on a reading test 
receive the program, those scoring above the cutting point do not. The 
preprogram eligibility must be related in a kno\^ \\[ay_ (eg, , linearly) 
to the post-program score in the absence of any program effects'. 

The post-program score is the dependent variable in such analyses. 
Assuming that the regression of this dependent variable on eligibility 
is linear in the absence of any prog,ram, one then looks for a discontinuity 

in the observed regression to infer program effects. That is, the 

regression of post-program reading scores on pre-program scores for the 
program recipient group viii differ from the corresponding regression 
line for the nonrecipient group, provided that the program has an effect 
(Figure lb). If there is no effect, both groups will be described well n 
by the same regression line (Figure- la) • 
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The Dcta: Descriptivfi Statistics and Adherence 
to the Design 

Reading test scores wore availab^^^ '"'>r ^Undents in the first, thirds, 
and fifth grades on the instruin n Section 2. llie statis- 

tical suninuiry of these data is gx able 1 and includes mean, variane. 

skewness, and kurtosis for each sample of recipients and nonrecipients 
within grade level, for pre-program test scor' (X) and post-program 
scores (Y) . ^ 

A . 1 Skev^ness 

111 o_u Rla Jh is_tl eX hwa i t e and Campbell (1960) do not seem to recognize 

it, the skewness statistic serves as a check on the process generating 
the RD data. For if a sharp cutting point is used for the pretest, we 
would expect scores of progranv recipients to be negatively skewed 
(bunched near the cutting point) and we would expect nonrecipicnt scores 
to be positively skewed, if the overall distribution is symmetric and 
roughly nornuAl in shape. 

For the post test, we would expect scores which are initially skewed 
negative to become less negatively skewed iC the treatment has an effect. 
That is, students would become more spread out in their ability (this 
assumes that the effect is not completely additive), llie nonrecipient group 
scores would also become less skewed, unless the posttest has a low ceiling, . 
in which case we might expect positive skew to become more negative. 

For the data at hand, we have what one might expect of the distribu- 
tion of program recipients* scores, lliey are skewed negatively, suggesting 
that a sharp cutting point has been used. And the negativity decreases 
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probably because of treatment effects, random errors of measurement, 
and other factors. This is true for each grade level examined except 
tiic thirds where level of skewiiess does not change. 

The sample distributions for the nonrecipicnts runs counter to 
expectations, however. With strict adherence ' '^n RD approach and with 
a roughly symmetric distribution of scores on X one would expect the 
nonrecipient sample to have a positive skew or possibly little or no 
skew. But in the data at hand, all samples are negatively skewed at 
pretest (X); the skew is notable for the first grade, negligible Hor the 
fifth. This suggests that some data may be tr.issing, that students are 
not assigned to program strictly on the basis of eligibility scores, and 
that the tests may have a low ceiling effect, especially for the fifth- 
grade group. 

4 . 2 Variances 

For truncated distributions of fallible observations, one would 
expect sonic increase in variance from pretest to posttest, and indeed this 
is reflected in the data. The coefficient of variation behaves quite 
erratically and is uninfomiative . 

4.3 Adherence to the Design 

The counter-intuitive statifltics for skewness implies a failure 
to adhere to a strict cutting point, and a need to review the information 
wa liave on assignment of children to the program. 

Of 25 elementary schools in the district, 11 received Title I funds 
during 1973-74 on the basis of need. Need here is defined by low 
average economic income level of families within the district iiccordinj] 
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to a weighted mean of the number of students identified under free lurn.h, 
V. ADC, and the 1970 U.S. Census. 

Within a school, the nominal cutting points for assignment to dervices 

' ■ are, as indicated ear lier, 71 for the fi rst grade, and 50 for the third 

and fifth grades. HoT;;ever, there was no strict adherence to the cutting 
point. In fact, soine program recipients received very high scores in the 
pretest, and there were tantial number of individuals who did score 

above the nominal cutii. ^ hv lo did receive services. The percentage 
of students in' each grade level in each subRroup is Riven below. 

"['■:'■■: N 

Eligible Recipients 347 

Ineligible Recipients 485 

Ineligible Non-- 358 
Recipients 

The number of individuals with high scores who receive services suggesL 
that teachers are ntteinpting to service as many students i\s possible. 

Most such students actually need the serviTer'^'^*nm>/eA;er , if ncecf'^^-delincd 
solely in terms of the criterion, many do not. Tlicrc may be several 
reasons for this phenomena. Reading program teachers may feel, compelled 
to make contact, however brief, with as many students as possible to 
satisfy some vague idea that the more students they serve, tlic most valuable 
their contribution will appear to be. The "program recipient" lal^el 
"Z itself may be misleading insofar as some of these students may receive 
only brief at tention-- enough, say, to establish furtlier that they need 
little or. no help. At this stage of the research we know little about 
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First Grade 
.17 
.45 
.38 



Third Grade 
.32 
.11 
.57 



.15 
.20 

.65 



which explanation is true. Other diiLa on duration or frequency of 
service to a student are essential for establishing an explanation, or at 
least establishing the extent to which ineligible recipients are receiving 
nominal tutoring. 

In any event, we conclude that the adherence to the original design 
is best for the third grade and fifth grades where respectively 11% and 
20% of chxl(l>' pretest scores are high ciie assigned to the program. 

Adherence xi? v;orst fur the first grade, llie implication of nonadherence 
to the original design is that the original design models and analysis 
determined by those models cannot be used without modification. 

5. Linear Regression Approach 

5.1 Eligible Recipients, Ineligible Nonrecipients 

The analyses in this section are based on the \,Y points for 
eligible : gram recipients and the corresponding^. ::ts for prtDgram 
nonrecipiecir:s who were ineligible for the program. it:a on children wlio 
were inciii-ijible for the program, i.e. scored above r .a cutting point 
on their reading pretest, but received the program are put aside temporarily. 

One of the simplest approaches to analyzing data from an RJ) set 
up is to assume that posttest is linearly related to pretest witlvin eacli 
group. That is, one assumes 



and 



^i2 = "2 Vi2 0.^.1(0,0) 
for the program recipient group (1) and the nonrecipient group (2) rcspcct- 
ivoiy. To assess the progrom's impact, assume that under null conditions, 
the rcgrerision lines arc identical; 
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The test of the hypothesis is clear-cut (see Chov/, 1960; Gulliksen & . . ^ 

Wilks, 195A, for example). If * tie hypothesis is rejected and the other. ' _ 

assumptions hold, we may infer that the progtam exerted an influence on -v; 

slope, intercept, or both parameters, and then conduct some other testis . 

on the data. Linear fits are illustroted in Figures 2~A • .-J 

■ ' • . . .' ;y 

5.11. IIq! 0^ = * '^'^^ preliminary examination suggests that . , - .f 
variances differ from group to group. The usual tests^of^hypothesi^ . 

suggests that tU^ conditional ^i^Xances do indeed differ for recipients ; 
and nonrecipi. ii-c .^x^ups . Specifically, we employ the usual tests for . 
equality of to find: 

First Grafe F = 3.57 df = 58, 155 

Third Grodu F r 3.48 df 154, 276 

Fifth Giadc J = 4.91 df - 53, 230 :^ 
which are eai v '^iinificant (two-tailed F, £ < -05). Variation about the | 
linear rogre:^^ii6^ lines ^is consistently greater for the program recipient 
groups. (Not. ' - U variance about some other iittcd curve may be 
homogeneous; ider this possibility below.-) 

5.12. m-. - a^; B.. If we choose to ignore variance 
difEercnccs in tlxLs particular case, we find that the fitted reurossion 
lines for reciplGfiV;; differs notably from the liine for nonrecipiants , 
regardless of grsKdirv The raw statistics are presented in Table 2. The 
F statistics ba'^cd en che null hypothesis given above are! 
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First Grade F = 8-89 with 2, 213 df; 
Third Grade F = 10-69 with 2, A30 d£; 
Fifth Grade F = 1.52 with 2, 283 df . 

1?/e^i*gnor^^n\^ are"""n6t as^^^ on 'ac'count-of 

the heterogeneity in variance, 

I'c iF"""clear that slopes within grades differ notably so that 
effects of a program may not be completely additive as the models here 
imply. The slope difference for fifth graders is not substantial. 

Conditional on the models, using the nonrecipient group as a 
standard, and ignoring ineligible recipients entirely, we would be forced 
to conclude the following from thiSu analysis. 

First Grade . Students who are low on pretest scores are positively 

affected by the program; students who are near the cutting point are not 

affected or affected negatively. This follows from regarding the 
regression for nonrecipients as a standard, and examining the whole line, 
not just elevation of the line. 

Since slopes differ between groups ," using elevation as an indicator 
of treatment effect is difficult. If we take the mean level of Y of the 
recipient group and examine it with respect to predicted (from nonrecipient 
line) we conclude that the effect is positive. If we consider only the 
elevation of Y for nonrecipients at the cutting point, we must conclude 
either no effects or negative effects. 

Any of these inferences may be wrong due to (a) possible floor/ceiling 
effects or (b) selection, effects on the nonrecipient regression line, 
or (c) both fcictors. 
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Third Grade; Students in the recipient group who are low on pretest 
scores appear to be negatively affected by the program, if the nonrecipients 
regression line is taken as a standard. The further away from the cutting 
point they fall, the worse off they appear. Again, these conclusions 
follow from considering the whole regression line,_r^^^^^ elevation. 

Since slopes again differ between groups, using elevation alone 
an indicator of program effect is difficult. If we take the mean level 
of Y for recipients relative to the- predicted (from nonrecipients) to 
estimate eJ:fects, we must conclude that the overall program impact is 
negative. The impact based on prediction at the cutting point is null 
or. negative . 

Again, the^e inferences may be wrong due to (a) ceiling and/or 
floor effects, or (b) selection affecting the nonracipientlinc, or (c) 
both. 

Fifth Grade . Students who are near the cutting point are affected 
positively or nagligibiy by therprogram; those whose pretests are very 
low are affected negatively. The standard here is the nonrecipient 's 
complete regression line. 

Considering elevation only, we conclude that on the average, 
the mean level of Y is reduced for recipients relative to predictions 
made from the noiirecipient line. The contrary is time if we focu.s on 
the cutting point. 

Again, ceiling effects and selection for this data may be critical 
atto^ maay obviate conclusions. 
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5.13. Wl thin-school Regression s . The pooled linear regression 
lines for grades 1, 3, and 5 for eligible recipients and nonrecipients 
are difficult to interpret. If the conventional model and approach is 
espoused, the program effect would seem to be negative. 

One possible problem with the approach is misspecif ication of the 
models in the conventional approach. In narti'^ :ula». , ^ wuat children 
ar^ students at some 11 different schools. It is possible that most 
students in the program recipient group are students of one cluster of 

S'chxDols, and most of those in the nonrecipient group are --student s-^in 

anather cluster. If thisi is the case, even in the absence of any program 
effect, there may be differences in the regression of pretest on posttest 
which are a function of school differences rather than program differences. 
In ordBr to assay the possibilityj a within-schools analysis is justified. 

Tlio estimates of slope and intercept for each school and for 
eligible recipients and nonxecipients is given in Table A . Also- given 
is the size of the ineligible recipient group. 

a. In all but three of. the schools, the sample size within the 
recipient group is marginally ' adequate . The following inferences stem 
from the -larger sample groups. 

b. The slopes for recipient groups always exceed those for 
nonrecipient 'groups, suggesting that the misspecif ication of model 
Ibecause of gross school-related variables is not really the problem. 
Even within school, slopes differ above and below the cutting point. 

c. The higher slope phenomena occurs if there are very few 
ineligible issLcipients (as in schools 6 and 8) as well as when there are 
a number of such recipients (e.g. schools 1 and 9). 
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d. Items b and c above raise tlie distinct possibility that the 
regression line is simply not linear. Tlie nonlinearity might, be^ 
caused by ceiling effects. 

e. Items c and d also suggest that selection ma> a f fact ilopo 
uniformly and regardless of the number of selectees, i.c, of ineligible 
nonrecipients . In particular, a peculiar seilcction strategy used by 
teachers in assigning ineligiible kids to prsngrams may influence the 
relation between Y and X for the nonrecipisits, since points lower on 

..X and..oa X are . talcen...o.ut^,Af_.t^^^^^ 3:t may influence the 

regression of Y on X for the selected group^ and make no different from 
the regression for the unselected group. 

Selection effects cannot influence the eligible recipient group 
though. All deserving kids get the program. 

How can we get a picture of how selection affects the nominal 
nonrecipient slopes? One option is to compare distributions on X of the 
ineligible recipients and recipients. With substantial overla]), the 
effect is likely to be small; with little overlap, the effect could be 
large. Also, some outside explanation (e.g. by teachers) 
could, be helpful , 

f. Finally, It appears that 

(i) if a selection effect is operating, it operates for 
all schools; 

(ii) if a ceiling effect is operating, it opc rates for 
all schools; ^ 

(iii) or feqth 

to paroduce :slope differences. 
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g. Items a throupji f assume tlvit the nact of program is strictly 
additive. If J' ^i^i > slopes will change consequence. It must 

then be nonadditive for each school. 

Assuming nonadditivity of effect still makes programs look worse. 
One would expect an increase slope with more effective treatment and 
indeed that is what occurs. However, mecin levels go down suggesting an 
overall decline in ability: the better students get better, the worse 
students get much worse, 

5.1A. 11^: = '^^^^ slope differences between lines for 

eligible recipients and nonrecipients are marked within both the third 
and first grades, Tlie usual F statistic for testing differences between 
slopes is a crude indicator of the level of that dif f ercnce : J' \ . 

Third Grade F = 21.5 with 1 and 430, 

First Grade F = 17.8 with 1 and, 213. 
Again, because variances are heterogeneous, the usual alpha levels are 
not as advertized. If we use the Cochran-Cox approach to testing the 
difference given heterogeneous residuals, we find these statistics 
significant, 

5.15, Deviations from predicted values at the cutting point. 
Still another way of apprsLising the imi^act of the reading program is to 
examine performance of ^program recipients whose pretests scores lay in 
the vicinity of the cutting point. Presumably, if the program, exerts an 
effect on these individuals and if the ra*gression line for nonrecipients 
can be taken as a reference, program ef:feicts will be reflected by the 
extent to which actual values of recipd.erat.s ' posttest scores deviate 
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from predictions based on the regresf;ion line. 

More specifically, we use as a basis for prediction the regression 
line based on nonrecipient data: 

2 

Y.=a+3X.+e. e.^ (0,a ) 

1 XX X 



for which 



Y, = a + 3x. 



is the prediction equation. We substitute the mean value of for 
individuals at or near the cutting point into the equation and so . estimate, 
the mean Y for this marginal group. To formally assay "the difference 
between a predicted value, Y,, and an actual value Y,, we use 

Y-Y 

with N - 2 degrees of freedom. 

Results of testing the hypothesis that predicted and actual values 
are identical yields no contradictory evident . The reading programs 
appear to add nothing to the level of performance of students near the 
cutting point for first, third, or fifth grades. In particular, the 
statistics are: 

First t^ = with df « 

Third £ « -•048 with df^ - 276 

Fifth t^ « .337 with d^ « 230 

Remarks: One reason for scrutinizing recipients whose pretest ; 
— — — — - — \ 

scares fall near or at the cutting point is the suspicion that these 
children might receive most attention in a special program. Teacliers 
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might regard them as most promising, mora easily rehabilitated or taught, 
and so forth. Tlxis suspicion is reinforced by the interaction, effect , 
i.e. different slopes of the regression lineis from group to group. 
However, the evidence accruing from the tests just described suggests 
that this "most malleable—best treated'* sequence probably does not occur 
in any grade. . 

5.16. Deviations from predicted values; Deviations selected a 
posteriori . Now suppose that program staff can dedicate substantial time 
and energy to only a, few students, given the large number of students in 
each program. The supposition immediately suggests that, rather than 
assume that all students labelled as "recipients" got special attention, 
we should search for Outliers, i.e. marked deviations from the regression 
linc.^ Such outliers might reflect positive effects, when teachers focus 
much greater attention on a few children; or they might reflect negative 
effects, when for example teachers ignore some low-calibre students or 
label them as such. - , 

The third and fifth-grade data do support this view in a limited 
way. For if we look at deviations from the (control) regression line, 
we find some significant departures,. In the fifth grade a cluster of 7 
students with marked positive deviation and a cluster of 5 students with 
negative deviation yielded the following t^ ratios (for test of deviation 
from the line) : 

IL = 3i64 with df = 230, X = 39.4, Y = 93.47; 

_L = -A. 10 with df = 230, X 31.28, Y = 23.80. 
For a cluster of 5 students w-^th positive deviations and a cluster of 9 
students with negative deviations, all in the third grade, v/e have: 
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« 2.27 . with df = 275, X = 21.80, Y ^ 89.20; 

= -5.63 with df^ = 276, X = 24.44, Y = 26.67. 
We conclude that at least some students are affected substantially by 
being in the reading program. A few appear to profit greatly; still others 
appear to be inhibited markedly by the program. 

Remcirks . Recall that earlier tests showed that the variance about 

the (linear) regression line differed from group to group. Tlie outliers 
chosen here for more intensive examination appear to be important in 
producing the heterogeneity of variance, those of which they are not the 
only cause. 

Note also that we have attached no particular p_value to any of 
the statistics above. If the potential deviations had been identified 
beforehand, then the tests constructed, then j)^ levels would be as adver- 
tised in ordinary tables and each statistic would deviate siignif icantly 
at least at the .025 level (two-tailed). Because the deviations were 
chosen a posteriori, however , we know that the usual £ values are 
inappropriate. The actual p values are greater than .025 but. we are 
unable to compute them. The test is suspicious in this respect. 

5.17. Do uble extrapolation; Differences between predicted Y's at 
margin, predictions based on CR, NR, IR, NR 4- IK . Sween (1971) and 
Campbell have recommended tliac in the simplest ca«e one examine predicted 
values of Y at the cutting, point to help establish the existence of an 
effect. In particular, one predicts a Y from the regression line for the 
program recipient group (Y/^qfr^ ^^^^ margin (Xq) , and one predicts a 
Y at the same value of X^, but using the nonrecipient line (giving "^/^qnR^ ' 

We adopt a similar but more elaborate strategy in this section. 
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Specifically, we compare predictions of Y when predicted values at Xq are 
based on equations from 

Eligible recipients and nonrecipients 

Eligible recipients and ineligible recipients 

Eligible recipients and the combination of nonrecipients and 
ineligible recipients 

Algebrair ally, we examine 

y\ y\ y\ /\ A 



^'eR = '^ER ^ER '^^0 ""^"^ ^IR = «IR ^IR ^0 



^ER = "eR + ^ER ^0 ^NR + IR = %R + IR ^NR + IR ^0 



using, where possible, a t^ statistic since the variance of these predicted 
values can be estimated. 

The first comparison is most direct when the design is adhered to 
perfectly. In the current data, such an estimator is biased to the 
extent that the nonrecipieat sample is biased by selective assignment of. 
ineligible students to the program. The second comparison is of interest 
in that it can proixride us with some information about the effect of 
selection on regression lines. The third comparision will yield an 
estimate of the joint effect of treatment and selection. 

Hie result of conducting a test of the equlity of predicted Y and 
Xp'for each comparison listed above, for each grade level, is unremarkable. 
In brief, ^ predictions at the margin do not diJEfeir. Ihe maximvim JL value 
of -1.26 (Cor the first-grade students) is significant at the 20% level; 
all remaining ^t*s are considerably smaller, in tihe range -.15 to .38. 
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(Incidenlally, Lhc criterion iL valiu* is constructed under the prescription 
given by Cochran and Cox to recognize inequality of variances.) 

We infer from all this that the hypothesis of '*no program effect" at 
the cutting point is a tcn£ible one for each grade. Again, the analysis 
is based on linear models and recognizes no effect on ceiling or floor 
on test scores. 

5.18. An Approach to Analysis which Recognizes Ceiling Effects . 

It's. clear that ceiling effects can complicate interpretation of RD data. 
Indeed, ceiling effects, if unrecognized, can lead to analyses which make 
program effects look harmful when in fact they are negligible (Appendix I). 
Tliis section offers a tentative approach to data analysis xvhich recognizes 
ceiling effects and avoids biases in estimates of program effect. 

Consider Figure X.l, which repres5ents a null condition . The dotted 
lines represent fitted regressions; the solid lines represent the relation 
between posttest and pretest, including a ceiling on posttest. The 
vertical line again represents the cutting point. 

The display emphasizes that in principle, at least, the symptoms of 
negligible effects are: 

(a) small negative difference, Y - Y , between means (projected 
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or actual) of R and NR groups at the cutting point. 

(b) greater slope in the R group than in the NR group; 

(c) lower bound (floor) for the R group is chance-level 
score. 

Note also that the relative slopes of R and NR arc predictable under this 
null condition, if the point of discontinuity in the true regression can 
be identified and a few assumptions are made. That is, under null 
conditions, the slope for R and a segment of the true slope for NR will 
be identical; one is observable (R) and we may assume the segment for NR 

is the same as that for the R. Given this information and some reasonable 

r 

guess as to the point of discontinuity (i.e. ceiling) and a few simplifying 
assumptions, it is a matter of algebra to compute an estimate of the 
complete regression line 'for the NR group. If this algebraic estimate 
differs much from the observed line, then we will know that there is some 
inconsistency, i.e. that the null condition is not fairly represented by 
the data. 

Consider Figure X.2, which represents a situation in which there is 
a notable treatment- effect exerted at least at the cutting point. In 
this case, it is possible to discriminate between IIq and by verifying 
that the difference between (projected or actual) values of Y^^ and Y^^^ 

at the cutting point is positive. It is a weiilc test in- the sense that a 
substantial negative difference characterizing the null condition may 
have been overcome by the program to produce a positive effect. 

It may be possible to examine Y^^ relative to an estimated slope 
segment (solid line) for the NR group. That is, given the observed 
slope •-for~the*^Nll-~group™given- a--reasonable--guess^as-~to^-where-thG 
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flattening begins, and a simplifying; assumption, it would be possible to 
compute an algebraic estimate of the true line segment's slope. Comparing 
the observed to the project o'|,„th(i line segment at the cutting point 
would produce a more powerful test, but one which is likely to bo imprecise. 

Figure X.3 illustrates a situation in which the program effect ±s 
additive, that exerted uniformly along the full range of pretest scores 
in the R group. One symptom here is again the elevated position of Y^^ 
relative to Y^^^^ at the cutting point, A second symptom is the closer 
match .Lotween Y^ computed at the midpoint of X^^ and the projection that 
Y based on the regression line computed from the NR group. Again both 
tests are weak, the first being weak for the reason described in the 
preceding paragraph. The second is weak because the vertical distance 
between Y^^ and an estimated Yj^ estimated from the NR group at K^^ depends 
on the magnitude of the ceiling effect. If most members of the NR 
group scored a<: or near the ceiling, that distance would be appreciable 
unless the ttoatment effect was quite large. 

Figure X.4 illustrates a situation in which treatment effects are 
exerted only in the- lower range of the X variable. The symptom of an 
effect is elevation of posttest scores for children with low scores on 
the pretest; the elevation is above chance level scoring, which is one 
standard for an optimistic (nonconserva tive test). The slopes are 
more informative in that if the slope for R is less than that for the 
NR group, it must follow that the program exerted an influence un low- 
scoring children, lliere is no other competing algebraic interpretation, 
though there may be competing empirical ones. 

Figure X.5 represents a situation in which program effects are 
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strongesr: firzr children v;Uo score lov on preLest ...id effects are weakest 
for chi^driir: scoring high on the pTe.test. IVo syt^iiiftmns detarmine the 
inferearui:- Flnst, the '^x actual) Y^^ at thf cr.atting. point 

equals cr- i,»v:'"-^s Y^^^. Second^ slope for the IL gcoajii equals or is 

less th:- t:hu slope for the Nll^-S^^P- Third, th.e Imieroept at X ~ 0 for 
the E gr - - ligression line equals or exceeds the . arcept for the NR 
group. 

The tsyFrrptoms of negative nrrogr am effects are istratod in Figure 
X.6. Here :;he general elevatiom of the regression is reduced relative to 
what it would be under null conditions. Estimating what the null condi- 
tion would be is again possible only if the point at which ceiling 
effects begin can be guessed at. The same perspective can be used to 
examine the possibility of negative effects occuring only for the most 
able children (Figure X.7). 

Negative effects on the least able children will be no more detect- 
able especially unless floor effects become influential. From Figure 
X.8, it's evident that such negative effects will be demonstrated by 
steep slope for the R group relative to the NR group and a negative 
(projected or actual) difference - Y^^^ at the cutting point. 

Unless one tries to estimate null regression lines using a plausible 
(guessed) value of ceiling, it is impossible to discriminate between 
tlus negative effect and null conditions. It may not be worth the 
trouble of other evidence or theory suggests that negative pro^;ram 
effects are implausible. 
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5.2 Inetli»yi>^ iFrfojrri'T Recipients 

Reca3^^. ^ \\\\ tliie 'Original design plan was riv"::. carried out coupfl^^tely. 
Ineligible <ttu .^se^^^^- t:-^ services. To usiderstand the impiicci^iiitions 

for analysis, V>> neec :io determine how closely the ineligible recipients 
resemble.. the -of^ner rtf^joqjs. ' ; ' 

The ineEiv pi^vigram recipients are much more similar to pr^sgrani 

nonrecipients ' r-^Tv *-:tey are to eligible prograr::rr.ecipients . Nonetheless, 
the differeno :^jto#ee;ii the ineligibles and the nonrecipients is scill 
notable and .a-' i^^z^^^^xZ ±rom grade to grade. 

In par tic Uic, ^pretest means for. ineligibles are consistently 
-smaller than, itc -^-^Hose to, pretest means for nonrecipients in the first, 
third, and fiif.u i;rades (Tables 1-2). Posttest means show a similar 
pattern. The dlf i:ex*BUces £ire significant in each case (using the 
Cochran-Cox tes'c tur equality of means with unequal variances) . 

1 'Variance of nx^etest and posttest scores also differ across ineligible 
recipients, noxn:£^l}?i^tts , and eligible recipients of the program. Again, 
the ineligible " rg^n^'gn ts and no-nrecipients are most similar with 
respect to variafefiuiior^r of scorra:; however, the differences are still 
significant and xn the same direction regardless of grade level. The 
variance of pretest scores of ineligible recipients is, except for the 
first grade, always smaller than, the variance for the nonrccipiunt 
group; evidently, tite imeligibles are being selected on implicit tcaclier 
criteria such tliat tlic^^ constitute a more homogeneous group. Tlio post- 
test scores fo^f ineligiLble recipients always exhibit more variability 
than the nonreclpii^ints * scores probably because of ceiling effects on 
the latter. 
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A visual inspectiian of the linr.ar regression p ^-aiv^eters for the 
varioiirs groups again suggests that ineligible recip.jatts resea^^le the 
nonreciplents more closely than they resemble the e., igible recdqiients 
(Tables J and 5). It is clear that in the first gr uki^ however^ 
regression lines . foi;. recipient and nonrecipient grou^-j^diLff er , primarily 
with respect to elevation (the slope differences is sn«iHlII,) • Far the 
third and fifth grades, the differences are very gnrrl 1, Tests o£ the 
hypoLhesis that the intercept and slope are identicsi-. jifield F ratxos in 
the 3-8 range, but residual variances differ a bit; residual variances for 
the ineligibles are consistently about half again as large as tike residuals 
for the nonrecipient data, so the conventional test's alpha level is not 
as advertised. 

The IR, NR, and ER groups differ with respect to simple descriptive 
parameters and regression pcurameters. Despite the close (visual similarity 
of the IR and NR groups, they too differ from one another and from grade 
to grade. Given that the IR and NR groups are, ^'comparable" in the sense 
that they resemble one another, ,and that one group receives the program 
while the other does not, one might think that a covariance approach 
might be used to estimate the program's ef feet . The approzicii xs; 
inapproprdiaTte, however, in part because required assumptions about 
homogeneiry of slope and residuals do not hold, and also bciiaase the 
undcrlyii2;g models are almost by definition misspecif icd Tliat is, 
something otlier than pretest scores is being used as the b^sssLa for 
assigning ineligible students to the program. 

Now despite the IR-NR differences, we migh:t choose to po^adL (these 
data on grounds that any evident differences are entirely a fimc'txDn of 
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Uha selection process rath than any ''treatment" eflect. Doing so, 
mag^ permit us to make mor^ /owerful tas:ts. If indeed tin i '.r'^iatment 
dc^ influence the IR groun, then tiists vhich compare tL'i^rc oiata with 
ER iam -znre likely to be wsvck: 

Tesrs of the hypothesis ' that: ij^sxitlual variance about the regression 
1 7 TIPS for the eligible recipient ^oup are equal to residual variance 
foir the. combined nonrecipiiint and ineligible recipient group result in 
tha following. 'F ratios for 

^ First Grade, F = 2.11 with 58 and 285 degrees of ireedoni:; 
Third Grade., F ^ 2.89 with 54 andL 327 degrees of freedom:; 
Fiith Grade, F =^ 4.11 with 53 and 299 degrees of Jrcedom. 

The F statistics suggest that residuals differ noitaiiil;yv and that 
variance for eligible recipients is consistently greater tlian residual 
variance far the combined NR and IR group. The implications are man^' 
and complicated, UeiUerogeneity of variances may be induced by: 

P,rcrgram cff eaz^ts on recipients: and, to a lesser extents 

on the comblT3?ed group; 

Poor fit of linear regression to one or boxh cairegorics 

of data, abiiw^ and belocr the cutting paint; 

A natural refatio.r» between variance of obserTyatians and 

rlie X variable; 

Qther reasons. 

If we ignore the heterogeneity and proceed to apply a conventional 
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£ 'ZBst fur 

V ^'^'"^ER " + IR 

we -Q:bir7T.fy?i relatively lax{^^ V statistics for grades 1 and 3> and an 
iiir.:iZsE::;mr:7^.fi ]. e F_ for the f frr:,. grade, 

E±3^C Grade: ' F = lO--'-^ with 2 and 3A3 degrees of freedom; 
.Trt±rd Grade: F_ =7.1(1'; with 2 and <481 degrees of freedom; 
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6. Results Df Fitting Quadratic Functions 
First Grade (Figure 5) 

2 

The collinearity of :: ani±:X make this analysis useless. Witlain 

the recipient group arsd the noimreclpienit gruup, correlations between X 
2 

and X exceed ,99. GVapliicai results suggest: that a .quadratic is not a 
gacad St. The interpanetatxnEm. based on the quadratic fit is nonsense, 

Tlilrxi^ Grade (Figure 6) 

'The collinearity prohHiaaji makes tliis amalysls useless. Tlic graphical 
results aippear more sansiibi1.r:,. Interpretataion suggesiis that the. pragrzim 
is harmful., a conclusion ivG&^ido not accegit: at thfe stage. 

Fifth Grade (Figure 7) 

2 

Again, collinessri ty of ?1 and X mak£ this Oinalysis .suspect. The 
grapitikral retsalts a bit more senslbla looking than the preceding; two 

analyses.. Moreover^ f jae chart suggests tihat the program exerted - 
positive effect. Agnm, we do :not accept .:±?id.s caiicluslon for ithe ifXftSi 
grade an the bas ;^:S oJ ciiiH evidlisnce alone » 

Xvider^rtfev S::t:ti::g a:, quadratic does not solv/e the prablcm of cillng 
eSeclis: at all, Tlici: Sf th-gxade. data look decent.^ bat the. first and 
tliir.d-rgxade data are. confusing. The ineligible rei:lpionls may account 
for thei peculiar results. 



7 * Summary 

The point that is made In this pap^r lioes not need belairoring . It 
vould appear that the wo rik shown in usiag riegression discontinuity evalu- 
ation at a local educational agency is massiTnental. It ±s realized, as was 
pointed out in the presentaitlon by Nortirwrast Regional Educational Labora-- 
tory when they presented idre models in jteo^ona-, that foxms -will be designed . 
so that the evaluator a:t t:h& EEA can simpJl^^ take from box 3, place in box 6, 
multiply by box 7, eta- Lt is difficult, tsi) accept that, such a^ process 
will produce the desiresali results. If thie .avaluators at tflne LEAs are given 
only those directions sind asked to use a t_£cbriique suxrh as the one presented 
in this paper,, it would be hard to belie^i^ that the resTallting information 
would be much ;better thann that Talmadge found in his ^sitaitdy :£3il974» It 
would appear from this exaiti^pls tlrat liiDt o-n!!^ dis the :trechid5que; very diffi- 
cult to implemHnt, but also to Interfile t* .Also, criteiirLoii-r^erenced data 
by its very naXuxre may n^t il.-snd. Itself ^ssrtd cert:a:iiily ikre^-nat: lend itself 
easily to thel-Regressirm ldm4el*. Cri:terti:mr*-ref:greiiaed fces^cs are designed 
so that a high proponn.oii of: ithe stmrente (can attKinnEsatijsry at; the com-- 
pletion of £he: ^program. Iin. irhe case lO-f tthiel'Hesa. Pub:ll!:c Sc&oo-is the under- 
lying program is- designed arpmirnd a Hixninial set of objtBCtli^es :fior that speci- 
fic grade level. It is :f elii :that tb^e: are rhe most important objectives 
and are the objectivsffls that ^all. (C2£ tiie sisasferiins siiouldi aamomplish by the 
end of the year. IlnrimdniF^nt :abo\^ thi^iaixiiinal .S3£t orJf aiioiectives is then 
left up to the clas^acDom: tea:c3h:er... Tit: d s zpH r pr t^'-nTy that the better 

students go far bey.aiii tihis miiiimal set of objectives asd In fact may enter 
a given grade level with, these .seit: of. objegrives master^EisL. It is, then, 
up to the teacher thruuigrh other :pres:cx±pl:ii>7e..methods to uertermine an in- 
dividualized course of ^n^structioti fer' ±'hzrt student . -HiowEver, because of 



rhis we seo that a high proportion of the students do attain a high degree 
.of mastery by the end of the year. 

The authors would again like to re-emphasize their commitment to 
comprehensive and valid evaluation at the local educational agency level. 
They would also like to acknowledge the significant efforts and work by 
lie. Talmadgo. As Dr. Boruch stated at the Conference On Minority Group 
Testing: 

''Mr. Talmadge has attacked an enormously complicated problem with 
energy and with an awareness of some lessons hard won during the 
past few years. We admire his fortitude in doing so. The paper 
itself touches on many of the techniques which have produced biased 
results in evaluative social research and so deserves recognition 
for its scope and tentative style. The paper also represen<:s an 
improvement over .the practices of many school districts an^i contracts 
in the matter of estimating the impact of educational programs. With 
a few remarkable exceptions those practices have resulted in a dismal 
array of evaluation reports and findings which badly undercut rather 
than enhance school districts' own efforts to improve education in 
difficult settings." 

In conclusion the authors would like to reiterate their concerns 
fox the implementation of the selected evaluation models at the LEA level. 
This is not to say that the comprehensive evaluation should be precluded 
at this level, but only that there is a need for valid and reliable informa- 
tion concerning the evaluation of compensatory education from the local 
school districts. 



37 



100.00 



80.00 



60.00 



qo.oo -- 



20.00 




-t- 



-1- 



£0.000 >I0.000 60.000 

X 

I 



00.000 



O DECKIO 
DECK 11 



100.000 



Figure 2. Poattest regression on preLest for FirsL-grade 
students, Eligible recipients (0), and Nonrecipien t's (+) , with " 
nominal cutting point of X = 71, 
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FiRvire A. Posetest rogression on iVretost Cor Fif tli-grndo 
stiiulonts; oLiglbie recipients (0), and nonrocipientu (+) » nominal 
cutting point n't X .= 50. 
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Figure 5. Quadratic regressions of.' posttcst on pretest for. 
FirsC-graUers. 
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Fijjure 6, Quadratic regression of posUUest on protest for 
Tliird-gradcrs. 
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Table 1 

Descriptive Statistics for Scores of First, Third, and Fifth 







Grade 


Children on 


Reading Tests 








First Grade 




N 


Mean 


Variance 


Skewness 


Kurtosis 


• 

Eligible Recipients 


X 


60 


60.12 


91.77 


-.98 


. 79 




V 


60 


50.05 


511.20 


-.19 


-.88 


Nonrccipicnt 


X 


157 


93.63 


48.31 


-1.45 


1.32 


II 


y 


15.7 


87.43 


174.86 


-1.58 


2.28 


■ 

Third Grade 














Eligible Recipients 


X 


156 


33.36 


144.54 


-.43 


-.88 


II 


y 


156 


65.07 


291.56 


-.55 


.43 


Nonrecipicut 


X 


278 


74.51 


134.03 


-.13 


-.94 


II 


y _ 


278 


88.66 


77.03 


-2.05 


8.3 


Fifth Grade 














Eligible Recipients 


X 


55 


41.51 


50.11 


-1.02 


1.14 


II 


y 


55 


64.26 


369.23 


-.59 


.62 


Non recipient 


X 


232 


7«.13 


185.32 


-.46 


-.97 


II 


y 


232 . 


87.91 


141.67 


-1.40 


1.25 
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• , Table 2, 

Descriptive Statistics for Ineligible Recipients 





> 




M 
IN 


Mean 


Variance 


Skewness 


Kurtosis 






Y 
A 




O J • *f 




m UO 


—X • J4 


First 


Grade 


















Y 


130 


70.52 . 


349.93 


-.70 


.21 






X 


51 


61.94 


78.42 


1.. 23 


.80 


Tliird 


Grade 


















Y 


51 


81.26 


157.15 


-.92 


1.11 






X 


70 


70.14 


151.75 


.20 


-1.15 


Fifth 


Grade 


















Y 


70 


82.04 


159.49 


-.97 


.67 
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Table 3 

2 ' 

Estimates of Parameters in Linear Model (Y * a + 3X + e, e ^ 1(0, a ) 
Fitted toFirst, ThirdV" and Fifth-Grade Students 



i a e ^ ^, SE(3) 

First Grade 

. Recipients 37.80 .20 .09 18.79 .31 

Nonrecipients 12.05 .81 -.42 12.99 

Third Grade 

:tecipients m.35 .73 .5L 3.A9 ^ 

* iffonrecipients 63.-77 . 33 .44 3.09 .04. 

IsSifth Grade 

Recipients 22.44 1.00 .37 14.58 .35 

Nonrecipients 37.90 .64 .73 3.11 .04 
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TabJe A 

Estimates of (Linear) Regression Parameters for 
within-School Analyses 



N a 3 

9 18.83 .99 16.17 

51^1: A6 58.-99 ' .78 7.61 

52-1 12 



5G-2 3 133.67 -1.45 17. Al 

5±=r2 63 58.77 .A2 6.3A 

5^. 7 



5l3h5 3 36.15 .55 7.48 

51- 5 43 18.13 .9.7 9.24 

52- 5 1 



50- 6 7 -32.03 2.22 17.47 

51- 6 11 18.91 .85 ' 6.60 

52- 6 1 . 



50- 8 9 8.57 1.46 J.8.69 

51- 8 27 22.83 .82 10.33 

52- 8 2 



50- 10 13 39.33 .62 16.34 

51- 10 12 55.40 .51 12.88 

52- 10 8 
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Table 5 

Linear Functions Fitted, to Data on IneligihlLe Program Recipients 





^ ■■ 


As 










a 




. R 


/sE(a) 


; _ SE(3) '^^ 


First Grade 


11.84 


.69 


.31 


15.83 


.18:' 


Third Grade 


50.80 


.49 


.35 


11.86 


.19 ; ■ 


Fifth Grade 


39.85 


.61 


.59 


. 7.17 


.10 
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Jabie 6 

Linear Functions Fitted to Data on Combined fiieligible 
Program Recipients amd Nonrecipieats: 











" 21 








a 


e 


R 




X 




First .Grade 


-12,98 


1.03 


.50 


2AA.44 


89.9 


75.9 


Third Grade 


59.70 


.38 


.47 


75.01 


72.6 


145.9 


Fifth Grade 


34.21 


.69 


.74 


78.88 . 


76.0 


207.0 
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Appendix: Effect of Ceiling on Observation of Y 
in RD Analysis: Graphical Interpretation 

1. In the simplest case, Y above a certain level will be unmeasureabie , 
and because all units are then assigned the maximum score, we have 

Y = a + 3X for Y < Yq . 

; Y^Yq forYXYg 

as in Figure!. What we observe is a discontinuous regression line, 
actually two lines. 

If, in the case just" described, one tried to fit a single line to the 
observations, Che slope and intercept estimator would appear as in 
Figure 2. The fitted line ^ 

Y = a' + 3'X 

would be such that 

0 < 3' < 3 and a < a' < Yq 

3. Now suppose furtlier that there is some point on the X axis which for 
theoretical or design reasons is thought to define two sections of tlie 
data to which different regressions must be fitted. For example, the 
cutting point in Figure 3 might indicate the separate sets of pbints 
to which regressioas must be fitted in a regression dytscontinuity 
analysis. Again, if one ignores or does not recognize the ceiling 

effects, fitting the regressions to l:h"e "Tef F side w^^ , yield 



Y = a ' + 3 *X 
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where 0 < 3 ' < 3 and 0 < a'< Y^, and for the right hand side, 

Y = a" + (0)X = Yq 
In general, there will be a gap between the end of the line Y = a* + 
3'X and the line Y = a*' = Yq. The greater "the . latter distance between 
cutting point and point of natural unrecognized discontinuity, the greater ' 
the gap. 

Figure A illustrates the consequence of phasing the cutting point below - 
rather than above the point of natural inflection in the line, i.e. below 
the discontinuity produced by the ceiling. In this instance, below the 
cutting point we have fitted 

Y = a' + 3'X 

where a* = a and 3' = 3, and above the cutting point we have fitted 

Y = a*' + 3*'X 

where 0 < 3" < 3 and 0 < a" < Yq. Again, there is a gap between lines at 
the cutting point, this time showing the right-hand curve at a higher level. 
The point of all this is that a ceiling effect, if unrecognized, will 
produce biases in estimates of slope and intercept parameters. 

The consequence of this in a regression discontinuity analysis can 
be dangerous. Suppose, for example, that all the diagrams really reflect 
only null conditions, and curves are fitted as in Figures 3 and A. T\\c 
inference one would draw from fitted lines in Figure 3 (if the left-hand 
side represents program recipients) is that the program harmed its recipients 
since (a) average elevation of the left-hand side is depressed at the 
average X and at the margin relative to the right-hand line, (b) tlie slope 
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increased as a consequence of troatment C when in fact no such 
change occurred). 
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Figuxe 1 



Figure 2 





Figure 3 



Figure A 



LEGEND 



True Relation Between Y & X 



Observed Relation Between Y & X (with ceiling) 

Fitted Regression of Y on X 

• Cutting Point on X Axis, Regressions Fitted 

Above and Below Cutting Point 

Ceiling on Y, Y Axis 



FIGURES FOR APPENDIX 
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